Using a Bilingual Context in Word-Based Statistical Machine Translation
نویسندگان
چکیده
In statistical machine translation, phrase-based translation (PBT) models lead to a significantly better translation quality over single-word-based (SWB) models. PBT models translate whole phrases, thus considering the context in which a word occurs. In this work, we propose a model which further extends this context beyond phrase boundaries. The model is compared to a PBT model on the IWSLT 2007 corpus. To profit from the respective advantages of both models, we use a model combination, which results in an improvement in translation quality on the examined corpus.
منابع مشابه
HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
We present a novel paradigm for statistical machine translation (SMT), based on a joint modeling of word alignment and the topical aspects underlying bilingual document-pairs, via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alig...
متن کاملLearning to Parse Bilingual Sentences Using Bilingual Corpus and Monolingual CFG
Abstract We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of individual sentence and the differences of word order within a syntactic structure. The method involves estimating lexical tr...
متن کاملLearning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation
We propose a simple log-bilinear softmaxbased model to deal with vocabulary expansion in machine translation. Our model uses word embeddings trained on significantly large unlabelled monolingual corpora and learns over a fairly small, wordto-word bilingual dictionary. Given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language...
متن کاملN-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination
In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual ...
متن کاملToward Better Chinese Word Segmentation for SMT via Bilingual Constraints
This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008